The European Bioinformatics Institute (EMBL-EBI) is an

Intergovernmental Organization Globalization Globalization, or globalisation (English in the Commonwealth of Nations, Commonwealth English; American and British English spelling differences#-ise, -ize (-isation, -ization), see spelling differences), is the process of ...

(IGO) which, as part of the

European Molecular Biology Laboratory The European Molecular Biology Laboratory (EMBL) is an intergovernmental organization dedicated to molecular biology research and is supported by 27 member states, two prospect states, and one associate member state. EMBL was created in 1974 and ...

(EMBL) family, focuses on research and services in bioinformatics. It is located on the

Wellcome Genome Campus The Wellcome Genome Campus is a scientific research campus built in the grounds of Hinxton Hall, Hinxton in Cambridgeshire, England. Campus The Campus is home to some institutes and organisations in genomics and computational biology. The C ...

Hinxton Hinxton is a village in South Cambridgeshire, England. The River Cam runs through the village, as does the Cambridge to Liverpool Street railway, though the village has no station. Hinxton parish's southern boundaries form the border between Ca ...

near

Cambridge Cambridge ( ) is a College town, university city and the county town in Cambridgeshire, England. It is located on the River Cam approximately north of London. As of the 2021 United Kingdom census, the population of Cambridge was 145,700. Cam ...

, and employs over 600

full-time equivalent Full-time equivalent (FTE), or whole time equivalent (WTE), is a unit that indicates the workload of an employed person (or student) in a way that makes workloads or class loads comparable across various contexts. FTE is often used to measure a ...

(FTE) staff. Institute leaders such as

Rolf Apweiler Rolf Apweiler is a director of European Bioinformatics Institute (EBI) part of the European Molecular Biology Laboratory (EMBL) with Ewan Birney. Education Apweiler gained his PhD in biochemistry from Heidelberg University. Research Apweiler has ...

Alex Bateman Alexander George Bateman is a computational biologist and Head of Protein Sequence Resources at the European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL) in Cambridge, UK. He has led the development of ...

Ewan Birney John Frederick William Birney (known as Ewan Birney) (born 6 December 1972) is joint director of EMBL's European Bioinformatics Institute (EMBL-EBI), in Hinxton, Cambridgeshire and deputy director general of the European Molecular Biology Labor ...

, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics. Additionally, the EMBL-EBI hosts training programs that teach scientists the fundamentals of the work with biological data and promote the plethora of bioinformatic tools available for their research, both EMBL-EBI and non-EMBL-EBI-based.

Bioinformatic services

One of the roles of the EMBL-EBI is to index and maintain biological data in a set of databases, including Ensembl (housing whole genome sequence data), UniProt (protein sequence and annotation database) and Protein Data Bank (protein and nucleic acid tertiary structure database). A variety of online services and tools is provided, such as Basic Local Alignment Search Tool (BLAST) or Clustal Omega sequence alignment tool, enabling further data analysis.

BLAST

BLAST Blast or The Blast may refer to: *Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front Film * ''Blast'' (1997 film), ...

is an algorithm for the comparison of biomacromolecule primary structure, most often nucleotide sequence of DNA/RNA and amino acid sequence of proteins, stored in the bioinformatic databases, with the query sequence. The algorithm utilizes scoring of the available sequences against the query by a scoring matrix such as BLOSUM 62. The highest scoring sequences represent the closest relatives of the query, in terms of functional and evolutionary similarity. The database search by BLAST requires input data to be in a correct format (e.g.

FASTA FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics. History The original FASTA program ...

, GenBank, PIR or EMBL format). Users may also designate the specific databases to be searched, select scoring matrices to be used and other parameters prior to the tool run. The best hits in the BLAST results are ordered according to their calculated E value (the probability of the presence of a similarly or higher-scoring hit in the database by chance).

Clustal Omega

Clustal Omega Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its ...

is a

multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...

(MSA) tool that enables to find an optimal alignment of at least three and maximum of 4000 input DNA and protein sequences. Clustal Omega algorithm employs two profile

Hidden Markov models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...

(HMMs) to derive the final alignment of the sequences. The output of the Clustal Omega may be visualized in a guide tree (the phylogenetic relationship of the best-pairing sequences) or ordered by the mutual sequence similarity between the queries. The main advantage of Clustal Omega over other MSA tools (Muscle,

ProbCons ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant adv ...

) is its efficiency, while maintaining a significant accuracy of the results.

Ensembl

Based at the EMBL-EBI, the Ensembl is a database organized around genomic data, maintained by the Ensembl Project. Tasked with the continuous annotation of the genomes of

model organisms A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the working ...

, Ensembl provides researchers a comprehensive resource of relevant biological information about each specific genome. The annotation of the stored reference genomes is automatic and sequence-based. Ensembl encompasses a publicly available genome database which can be accessed via a web browser. The stored data can be interacted with using a graphical UI, which supports the display of data in multiple resolution levels from karyotype, through individual genes, to nucleotide sequence. Originally centered on vertebrate animals as its main field of interest, since 2009 Ensembl provides annotated data regarding the genomes of plants, fungi, invertebrates, bacteria and other species, in the sister project

Ensembl Genomes Ensembl Genomes is a scientific project to provide genome-scale data from non-vertebrate species. The project is run by the European Bioinformatics Institute, and was launched in 2009 using the Ensembl technology. The main objective of the Ensem ...

. As of 2020, the various Ensembl project databases together house over 50 000 reference genomes.

PDB

PDB is a database of three dimensional structures of biological macromolecules, such as proteins and nucleic acids. The data are typically obtained by

X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...

NMR spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fie ...

, and submitted manually by structural biologists worldwide through PDB member organizations – PDBe, RCSB, PDBj and BMRB. The database can be accessed through the webpages of its members, including PDBe (housed at the EMBL-EBI). As a member of the wwPDB consortium, PDBe aids in the joint mission of archiving and maintenance of macromolecular structure data.

UniProt

UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...

is an online repository of protein sequence and annotation data, distributed in UniProt Knowledgebase (UniProt KB), UniProt Reference Clusters (UniRef) and UniProt Archive (UniParc) databases. Originally conceived as the individual ventures of EMBL-EBI, Swiss Institute of Bioinformatics (SIB) (together maintaining Swiss-Prot and TrEMBL) and

Protein Information Resource The Protein Information Resource (PIR), located at Georgetown University Medical Center, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. It contains protein sequences databases H ...

(PIR) (housing Protein Sequence Database), the increase in the global protein data generation led to their collaboration in the creation of UniProt in 2002. The protein entries stored in UniProt are cataloged by a unique UniProt identifier. The annotation data collected for the each entry are organized in logical sections (e.g. protein function, structure, expression, sequence or relevant publications), allowing a coordinated overview about the protein of interest. Links to external databases and original sources of data are also provided. In addition to standard search by the protein name/identifier, UniProt webpage houses tools for BLAST searching, sequence alignment or searching for proteins containing specific peptides.

Other bioinformatics organisations

National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...

(NCBI),

United States National Library of Medicine The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. Its ...

National Institute of Genetics The National Institute of Genetics ("Japanese Institute of Genetics") is a Japanese institution founded in 1949. It hosts the DNA Data Bank of Japan The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It i ...

(

DNA Data Bank of Japan The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Da ...

) * Swiss Institute of Bioinformatics (SIB: Expasy) * Australia Bioinformatics Resource * BIG Data Center (National Genomics Data Center), Beijing Institute of Genomics,

Chinese Academy of Sciences The Chinese Academy of Sciences (CAS); ), known by Academia Sinica in English until the 1980s, is the national academy of the People's Republic of China for natural sciences. It has historical origins in the Academia Sinica during the Republi ...

European Bioinformatics Institute, Hinxton 2

References

Bioinformatics organizations Biological research institutes in the United Kingdom Information technology organizations based in Europe International research institutes Molecular biology institutes Partner institutions of the University of Cambridge Research institutes established in 1992 Research institutes in Cambridgeshire Science and technology in Europe South Cambridgeshire District Systems science institutes 1992 establishments in England {{Med-org-stub